-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Conversation
@@ -249,8 +348,9 @@ void BinaryBroadcastBackwardUseIn(const nnvm::NodeAttrs& attrs, | |||
const std::vector<OpReqType>& req, | |||
const std::vector<TBlob>& outputs) { | |||
TShape new_lshape, new_rshape, new_oshape; | |||
bool need_bc = BinaryBroadcastShapeCompact(outputs[0].shape_, outputs[1].shape_, inputs[0].shape_, | |||
&new_lshape, &new_rshape, &new_oshape); | |||
const bool need_bc = BinaryBroadcastShapeCompact(outputs[0].shape_, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was problem converting in to bool here
s, new_oshape.Size(), req[0], lstride, rstride, oshape, | ||
inputs[0].dptr<DType>(), inputs[1].dptr<DType>(), outputs[0].dptr<DType>(), | ||
inputs[0].Size(), inputs[1].Size()); | ||
mshadow::Shape<NDim> oshape = new_oshape.get<NDim>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make my IDE stop complaining because it can't figure out the namespace
78844b7
to
c8446f1
Compare
inc(&coord, oshape, &lidx, lstride, &ridx, rstride); | ||
KERNEL_ASSIGN(out[base+i], req, OP::Map(lhs[lidx], rhs[ridx])); | ||
DType* out) { | ||
if (req != kNullOp) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if req is null the kernel shouldn't be launched
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently in BinaryBroadcastCompute, there's no check for req -- kernel is called anyway. This is typical for most calls such as this (nnvm unary or binary ops). In some cases, it's done indirectly with the Req switch, but you said that wasn't worth the compile time+binary size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added check to Compute call in the "remove broadcast" commit
* \return true if OMP parallelization should be used for the N iterations | ||
*/ | ||
template<typename ...Args> | ||
static bool UseOMP(const size_t N, const size_t thread_count, OpReqType req, Args... args) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks wildly too complicated for a broadcasting kernel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
“Wildly”? Ok so won’t support broadcast. I’ll remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
broadcast support removed
OMP overhead this run, ~8500 ns. unary and binary op times: 0.5-200 ns OperatorTuneBase::duration_t OperatorTuneBase::omp_overhead_ns_ = 8495; |
Resetting commit in order to cleanly remove broadcast |
296e502
to
56acc81
Compare
* Refreshed branch bc_tune * local-build openmp as static * trigger * Somehow broadcast found its way back in, removed again * Trigger rebuild
* Refreshed branch bc_tune * local-build openmp as static * trigger * Somehow broadcast found its way back in, removed again * Trigger rebuild
* Refreshed branch bc_tune * local-build openmp as static * trigger * Somehow broadcast found its way back in, removed again * Trigger rebuild
@piiswrong
Description
Automatic OMP operator tuning based upon kernel operation workload.
Determines "weight" of a unary or binary kernel op and then uses this to determine if OMP should be used, given # of iterations required and # threads to perform the job.
Correct decision accuracy is tested in gtest OMP_TUNING test suite by comparing with OMP, without OMP, and Auto times.
For example:
AWS c4.8xlarge (36 vCores):
Success rate for type float: 0.90278
Success rate for type double: 0.88889
Success rate for type mshadow::half::half_t: 0.83333
Success rate for type unsigned char: 0.86111
Success rate for type int: 0.95833
Success rate for type long: 0.88889
desktop: 12-core (6 real CPU cores + hyperthreading)
Success rate for type float: 0.79167
Success rate for type double: 0.75000
Success rate for type unsigned char: 0.72222
Success rate for type int: 0.94444
Success rate for type long: 1.00000
A sample output from OMP_TUNING tests including staticstical data:
tune_all.txt
Currently autotuned kernel operators (tuning at startup takes a total of ~ 3ms):
mxnet::op::PopulateFullIdxRspKernel
mxnet::op::mxnet_op::set_to_int<0>
mxnet::op::mshadow_op::smooth_l1_gradient
mxnet::op::mshadow_op::smooth_l1_loss
mxnet::op::mshadow_op::eq
mxnet::op::mshadow_op::ne
mxnet::op::mshadow_op::le
mxnet::op::mshadow_op::lt
mxnet::op::mshadow_op::hypot_grad_right
mxnet::op::mshadow_op::hypot_grad_left
mxnet::op::mshadow_op::hypot
mxnet::op::mshadow_op::arctanh_grad
mxnet::op::mshadow_op::arctan_grad
mxnet::op::mshadow_op::cosh
mxnet::op::mshadow_op::rpower
mxnet::op::mshadow_op::minimum
mxnet::op::mshadow_op::arctan
mxnet::op::mshadow_op::reciprocal_square_root
mxnet::op::mshadow_op::rminus
mxnet::op::mshadow_op::arccosh_grad
mxnet::op::mshadow_op::square_root_grad
mxnet::op::mshadow_op::arctanh
mxnet::op::mshadow_op::floor
mxnet::op::mshadow_op::cosh_grad
mxnet::op::mshadow_op::ceil
mxnet::op::mshadow_op::cos_grad
mxnet::op::mshadow_op::reciprocal_cube_root_grad
mxnet::op::mshadow_op::arcsinh_grad
mxnet::op::mshadow_op::sin
mxnet::op::mshadow_op::arcsin
mxnet::op::mshadow_op::log10_grad
mxnet::op::mshadow_op::log1p_grad
mxnet::op::mshadow_op::mod_grad
mxnet::op::mshadow_op::arccos_grad
mxnet::op::mshadow_op::exp
mxnet::op::mshadow_op::tanh_grad
mxnet::op::mshadow_op::log1p
mxnet::op::mshadow_op::rint
mshadow::op::minus
mxnet::op::mshadow_op::relu_grad
mxnet::op::mshadow_op::identity
mxnet::op::mshadow_op::maximum
mxnet::op::mshadow_op::reciprocal_grad
mshadow::op::div
mxnet::op::mshadow_op::rmod_grad
mxnet::op::mshadow_op::arcsin_grad
mxnet::op::mshadow_op::ge
mxnet::op::mshadow_op::gammaln_grad
mxnet::op::mshadow_op::sigmoid
mxnet::op::mshadow_op::power_rgrad
mxnet::op::mshadow_op::identity_grad
mxnet::op::mshadow_op::tan
mxnet::op::mshadow_op::gamma
mxnet::op::mshadow_op::arcsinh
mshadow::op::identity
mxnet::op::mshadow_op::square_root
mxnet::op::mshadow_op::reciprocal_square_root_grad
mxnet::op::mshadow_op::cos
mxnet::op::mshadow_op::log2
mxnet::op::mshadow_op::tanh
mxnet::op::mshadow_op::arccosh
mxnet::op::mshadow_op::negation
mxnet::op::mshadow_op::log10
mxnet::op::mshadow_op::cube_root_grad
mxnet::op::mshadow_op::expm1
mxnet::op::mshadow_op::arccos
mxnet::op::mshadow_op::rmod
mxnet::op::mshadow_op::softrelu_grad
mxnet::op::mshadow_op::sinh
mxnet::op::mshadow_op::log_grad
mxnet::op::mshadow_op::sin_grad
mxnet::op::mshadow_op::rdiv_grad
mxnet::op::mshadow_op::log
mxnet::op::mshadow_op::softrelu
mxnet::op::mshadow_op::square_grad
mxnet::op::mshadow_op::log2_grad
mxnet::op::mshadow_op::cube_root
mxnet::op::mshadow_op::reciprocal_cube_root
mxnet::op::mshadow_op::sign
mxnet::op::mshadow_op::square
mxnet::op::mshadow_op::sign_grad
mxnet::op::mshadow_op::round
mxnet::op::mshadow_op::trunc
mxnet::op::mshadow_op::mod_rgrad
mxnet::op::mshadow_op::reciprocal
mxnet::op::mshadow_op::fix
mxnet::op::mshadow_op::gamma_grad
mxnet::op::mshadow_op::gammaln
mxnet::op::mshadow_op::degrees
mshadow::op::right
mxnet::op::mshadow_op::sinh_grad
mxnet::op::mshadow_op::degrees_grad
mshadow::op::plus
mxnet::op::mshadow_op::radians
mxnet::op::mshadow_op::sigmoid_grad
mxnet::op::mshadow_op::radians_grad
mxnet::op::mshadow_op::gt
mxnet::op::mshadow_op::mod
mshadow::op::mul
mxnet::op::mshadow_op::rdiv
mxnet::op::mshadow_op::tan_grad
mxnet::op::mshadow_op::div_grad
mxnet::op::mshadow_op::div_rgrad
mxnet::op::mshadow_op::left
mxnet::op::mshadow_op::right
mxnet::op::mshadow_op::power
mxnet::op::mshadow_op::power_grad
mxnet::op::mshadow_op::relu
mxnet::op::mshadow_op::abs
mxnet::op::mshadow_op::rpower_grad
Checklist
Essentials
make lint
)Changes
Comments